CC-SLIQ: Performance Enhancement with 2k Split Points in SLIQ Decision Tree Algorithm

نویسنده

  • Narasimha Prasad
چکیده

Decision trees have been found to be very effective for classification in the emerging field of data mining. This paper proposes a new method: CC-SLIQ (Cascading Clustering and Supervised Learning In Quest) to improve the performance of the SLIQ decision tree algorithm. The drawback of the SLIQ algorithm is that in order to decide which attribute is to be split at each node, a large number of Gini indices have to be computed for all attributes and for each successor pair for all records that have not been classified. SLIQ employs a presorting technique in the tree growth phase that strongly affects its ability to find the best split at a decision tree node. However, the proposed model eliminates the need to sort the data at every node of the decision tree; as an alternative the training data uses a k-means clustering data segmentation only once for every numeric attribute at the beginning of the tree growth phase. The CC-SLIQ algorithm inexpensively evaluates split points that are twice the cluster size k and results in a compact and accurate tree, scalable for large datasets as well as classified datasets with a large number of attributes, classes, and records. The classification accuracy of this technique has been compared to the existing SLIQ and Elegant decision tree methods on a large number of datasets from the UCI machine learning and Weather Underground repository. The experiments show that the proposed algorithm reduces the computation of split points by 95.62%, decision rules generated by 56.5% and also leads to better mean classification accuracy of 79.29%, thus making it a practical tool for data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elegant Decision Tree Algorithm for Classification in Data Mining

Decision trees have been found very effective for classification especially in Data Mining. This paper aims at improving the performance of the SLIQ decision tree algorithm (Mehta et. al,1996) for classification in data mining The drawback of this algorithm is that large number of gini indices have to be computed at each node of the decision tree. In order to decide which attribute is to be spl...

متن کامل

SLEAS: Supervised Learning using Entropy as Attribute Selection Measure

There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Le...

متن کامل

Fuzzy SLIQ Decision Tree Based on Classification Sensitivity

The determination of membership function is fairly critical to fuzzy decision tree induction. Unfortunately, generally used heuristics, such as SLIQ, show the pathological behavior of the attribute tests at split nodes inclining to select a crisp partition. Hence, for induction of binary fuzzy tree, this paper proposes a method depending on the sensitivity degree of attributes to all classes of...

متن کامل

Sliq: a Fast Scalable Classiier for Data Mining

Classiication is an important problem in the emerging eld of data mining. Although classiication has been studied extensively in the past, most of the classiication algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classi-er and presents the design of SLIQ 1 , a new classiier...

متن کامل

SLIQ: A Fast Scalable Classi er for Data Mining

Classi cation is an important problem in the emerging eld of data mining. Although classi cation has been studied extensively in the past, most of the classi cation algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classier and presents the design of SLIQ, a new classi er. SL...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014